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A DATA-DRIVEN BLOCK THRESHOLDING APPROACH TO 
WAVELET ESTIMATION 
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University of Pennsylvania and Yale University 



A data-driven block thresholding procedure for wavelet regression 
is proposed and its theoretical and numerical properties are investi- 
gated. The procedure empirically chooses the block size and threshold 
level at each resolution level by minimizing Stein's unbiased risk es- 
timate. The estimator is sharp adaptive over a class of Besov bodies 
and achieves simultaneously within a small constant factor of the 
minimax risk over a wide collection of Besov Bodies including both 
the "dense" and "sparse" cases. The procedure is easy to implement. 
Numerical results show that it has superior finite sample performance 
in comparison to the other leading wavelet thresholding estimators. 

1. Introduction. Consider the nonparametric regression model 



where ti = i/n, a is the noise level and Zj's are independent standard normal 
variables. The goal is to estimate the unknown regression function /(■) based 
on the sample {yi}- 

Wavelet methods have demonstrated considerable success in nonparamet- 
ric regression. They achieve a high degree of adaptivity through thresholding 
of the empirical wavelet coefficients. Standard wavelet approaches threshold 
the empirical coefficients term by term based on their individual magnitudes. 
See, for example, Donoho and Johnstone (1994a), Gao (1998) and Antoniadis 
and Fan (2001). More recent work has demonstrated that block threshold- 
ing, which simultaneously keeps or kills all the coefficients in groups rather 
than individually, enjoys a number of advantages over the conventional term- 
by-term thresholding. Block thresholding increases estimation precision by 
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utilizing information about neighboring wavelet coefficients and allows the 
balance between variance and bias to be varied along the curve which results 
in adaptive smoothing. The degree of adaptivity, however, depends on the 
choice of block size and threshold level. 

The idea of block thresholding can be traced back to Efromovich (1985) 
in orthogonal series estimators. In the context of wavelet estimation, global 
level-by-level thresholding was discussed in Donoho and Johnstone (1995) 
for regression and in Kerkyacharian, Picard and Tribouley (1996) for den- 
sity estimation. But these block thresholding methods are not local, so they 
do not enjoy a high degree of spatial adaptivity. Hall, Kerkyacharian and 
Picard (1999) introduced a local blockwise hard thresholding procedure for 
density estimation with a block size of the order (logn)^ where n is the 
sample size. Cai (1991) considered blockwise James-Stein rules and inves- 
tigated the effect of block size and threshold level on adaptivity using an 
oracle inequality approach. In particular it was shown that a block size of or- 
der logn is optimal in the sense that it leads to an estimator which is both 
globally and locally adaptive. Cai and Silverman (2001) considered over- 
lapping block thresholding estimators and Chicken and Cai (2005) applied 
block thresholding to density estimation. 

The block size and threshold level play important roles in the performance 
of a block thresholding estimator. The local block thresholding methods 
mentioned above all have fixed block size and threshold and same thresh- 
olding rule is applied to all resolution levels regardless of the distribution of 
the wavelet coefficients. In the present paper, we propose a data-driven ap- 
proach to empirically select both the block size and threshold at individual 
resolution levels. At each resolution level, the procedure, SureBlock, chooses 
the block size and threshold by minimizing Stein's Unbiased Risk Estimate 
(SURE). By empirically selecting both the block size and threshold and 
allowing them to vary from resolution level to resolution level, SureBlock 
has significant advantages over the more conventional wavelet thresholding 
estimators with fixed block sizes. 

Both the numerical performance and asymptotic properties of SureBlock 
are studied in this paper. The SureBlock estimator is completely data-driven 
and easy to implement. A simulation study is carried out and the numer- 
ical results show that SureBlock has superior finite sample performance in 
comparison to the other leading wavelet estimators. More specifically, Sure- 
Block uniformly outperforms both VisuShrink and SureShrink [Donoho and 
Johnstone (1994a, 1995)] in all 42 simulation cases in terms of the average 
squared error. SureBlock procedure is better than BlockJS [Cai (1991)] in 
37 out of 42 cases. 

The theoretical properties of SureBlock are considered in the Besov space 
formulation, that is, by now classical for the analysis of wavelet methods. 
Besov spaces, denoted by Bp^^ and defined in Section 5, are a very rich class 
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of function spaces which contain functions of inhomogeneous smoothness. 
The theoretical results show that SureBIock automatically adapts to the 
sparsity of the underlying wavelet coefficient sequence and enjoys excellent 
adaptivity over a wide range of Besov bodies. In particular, in the "dense 
case" p>2 the SureBIock estimator is sharp adaptive over all Besov bodies 
q(M) with p = q = 2 and adaptively achieves within a factor of 1.25 of the 
minimax risk over Besov bodies -B^ ^(M) for all p > 2, q> 2. At the same 
time the SureBIock estimator achieves simultaneously within a constant fac- 
tor of the minimax risk over a wide collection of Besov bodies -Bp ,j(M) in the 
"sparse case" p <2. These properties are not shared simultaneously by many 
commonly used fixed block size procedures such as VisuShrink [Donoho and 
Johnstone (1994a)], SureShrink [Donoho and Johnstone (1995)] or BlockJS 
[Cai (1991)]. 

The paper is organized as follows. In Section 2 we introduce the SureBIock 
method for the multivariate normal mean problem and derive oracle inequal- 
ities for the SureBIock estimator. The results developed in this section pro- 
vide motivations and necessary technical tools for SureBIock in the wavelet 
regression setting. In Section 3, after a brief review of wavelets, the SureBIock 
procedure for the nonparametric regression is proposed. Section 4 discusses 
numerical implementation and compares the numerical performance of Sure- 
Block with those of VisuShrink [Donoho and Johnstone (1994a)], SureShrink 
[Donoho and Johnstone (1995)] and BlockJS [Cai (1991)]. Asymptotic prop- 
erties of the SureBIock estimator are presented in Section 5. The proofs are 
given in Section 6. 

2. Estimation of a normal mean. As mentioned in the Introduction, 
through an orthogonal discrete wavelet transform (DWT) the nonparametric 
regression problem can be turned into a problem of estimating the wavelet 
coefficients at individual resolution levels. The function estimation proce- 
dure as well as the analysis of the estimator become clear once the problem 
of estimating the wavelet coefficients at a given resolution level is well un- 
derstood. In this section we shall treat the estimation problem at a single 
resolution level by considering a more generic problem, that of estimating 
the mean of a multivariate normal variable. 

Suppose that we observe 

(2) Xi = ei + Zi, Zi'-'^- N {0,1), i = l,2,...,d, 

and wish to estimate the mean vector 6 = {9i, . . . , 6^) based on the observa- 
tions X = (xi, . . . , Xd) under the average mean squared error 

d 

(3) R{e,e) = dr^Y.^^^i-^if- 

i=l 
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This normal mean problem occupies a central position in statistical esti- 
mation theory. Many methods have been introduced in the literature. In 
this section, with the application to wavelet function estimation in mind, 
we estimate the mean ^ by a blockwise James-Stein estimator with block 
size L and threshold level A chosen empirically by minimizing SURE. Oracle 
inequalities are developed in Section 2.2. 

2.1. SureBlock Procedure. A block thresholding procedure thresholds the 
observations in groups and makes simultaneous decisions on all the means 
within a block. Let L > 1 be the possible length of each block, and m = d/L 
be the number of blocks. (For simplicity we shall assume that d is divisible 
by L in the following discussion.) Fix a block size L and a threshold level 
A and divide the observations xi, 0:2, . . . , into blocks of size L. Let Xf, = 
. . . , Xfei) represent observations in the 6th block, and similarly 

= (6'(b_i)i+i,...,(9bL) and Zf, = (^(6-i)L+i,---,2feL)- Let Sl = ||x;,||| for 6 = 
1,2, ... ,m. The blockwise James-Stein estimator is given by 

(4) ^^(A,L) = (^1- A^^^f,, 6 = l,2,...,m, 

where A > is the threshold level. Block thresholding estimators depend on 
the choice of the block size L and threshold level A which largely determines 
the performance of the resulting estimator. It is thus important to choose L 
and A in an optimal way. 

We shall select the block size L and threshold level A by empirically 
minimizing SURE. Write 6^{X,L) = Xj, + g(xi,), where g is a function from 
to M^. Stein (1981) showed that when g is weakly differentiable, then 

EeJMX, L) - = Ee^{L + ||<7||i + 2V • g}. 

In our case, g{x^^) = (1 — -ki)+x^^ — x^, is weakly differentiable. Simple calcu- 
lations show E'e^||^{,(A, L) — ^dUl = Eq^{SURE{xi^, X, L)), where 

(5) SURE{x„ X,L)=L+ ^'"^^ "^^ /(^fe > A) + {Sl - 2L)I{Sl < A). 
This implies that the total risk Eg\\0{X, L) - 0\\l = EgSURE{x, X, L), where 

m 

(6) SURE{x,X,L)=J2SURE{x^,X,L) 

b=l 

is an unbiased risk estimate. Our estimator is constructed through a hybrid 
method. Set Ta = d''^ J2ixf - I) , 7d = d'^/"^ \og^j^ d and A^ = 2L\ogd. Let 
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(A*,L*) denote the minimizers of SURE with an additional restriction on 
the search range 

(7) (A*,L*)= argmin SURE{x,\L). 

max{L-2,0}<A<A^',l<L<di/2 

Define the estimator 9* [x) of 9 by 

rb=l6(A*,L*) \iTd>ld and 

(8) 

9* = (l-'-^)^x. ifT,<,,. 

We shall call this estimator the SureBlock estimator. When < 7d the 
estimator is a degenerate block James-Stein estimator with block size L = l. 
In this case the estimator is also called the nonnegative garrote estimator. 
See Breiman (1995) and Gao (1998). The SURE approach has also been 
used for the selection of the threshold level for fixed block size procedures, 
term-by-term thresholding (L = 1) in Donoho and Johnstone (1995) and 
block thresholding (L = logn) in Chicken (2005). 

Remark. The hybrid scheme is used to guard against situations of ex- 
treme sparsity of the mean vector. See also Donoho and Johnstone (1995) 
and Johnstone (1999). 

2.2. Oracle inequalities. We shall now consider the performance of the 
SureBlock estimator by comparing with that of ideal "estimators" equipped 
with an oracle. An oracle does not reveal the true estimand, but does "know" 
the optimal choice within a class of estimators. These ideal "estimators" 
are not true estimators in the statistical sense because the oracle depends 
on the unknown estimand. But the oracle risk of the "ideal estimators" 
provides a benchmark for the performance of estimators. It is desirable to 
have statistical estimators which can mimic the performance of the oracle. 
We shall consider two oracles: block thresholding oracle and linear shrinkage 
oracle. The oracle inequalities developed in this section are useful for showing 
the adaptivity results of SureBlock in the wavelet estimation setting. In 
particular, these results will be used in the proof of Theorem 3 given in 
Section 5. 

Block thresholding oracle. Within the class of the block thresholding es- 
timators, there is an "ideal estimator" which uses the optimal block size 
and threshold level so that the risk is minimized. The block thresholding 
oracle does not tell the true mean 9, but "knows" the values of the ideal 
parameters, 

(9) (A°,L°)= argmin r(A,L)= argmin r{X,L), 

0<A,l<L<dl/2 max{L-2,0}<A,l<L<di/2 
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where r(A, L) = d^^E\\e{X, L) -6\\l. Denote by Ruock. oracieifi^ the Oracle risk 
of the ideal block thresholding estimator 9{X°,L°), that is, 

(10) Rbiock.oracU0) = riX°,L°)= inf r(A,L). 

max{L-2,0}<A,l<L<di/2 

Linear shrinkage oracle. Linear shrinkage estimators have been com- 
monly used in estimating a normal mean. A linear shrinker takes the form 
6 = jx where < 7 < 1 is the shrinkage factor. The linear shrinkage oracle 
"knows" the ideal shrinkage factor 7* which equals ||^||2/(ll^ll2 + '^)- Simple 
calculations show that the risk of the ideal linear "estimator" 9 = j*x is 
given by 

\\e\\l 

^linear. oracle — ii/iiio , 

Ir II2 + " 

The following oracle inequalities show that the SureBlock estimator mim- 
ics the performance of both the block thresholding oracle and the linear 
shrinkage oracle. 

Theorem 1. Let {xi,i = l,...,d} be given as in (2) and let 6* be the 
SureBlock estimator defined in (8). 

(a) (Block thresholding oracle.) For some constant c > 0, 

(11) R{6\e) < Ruock.oracie{0) + cd-^'\\ogdf/^ for all GM'^. 

(b) (Linear shrinkage oracle.) For some constant c > 0, 

R{e\e) < Runear.oracUO) + cd-^/\\ogdf/^ for all 9 G M^. 

(c) Set fid = ll^lli/'^ ^'^^ 7d = d~^^^log2^'^ d. There exists some constant 
c> such that for all 9 satisfying fid < ^Jd 

(12) i?(r,0) <(i^i^02^21og(i + cd"i(logci)"i/2. 

i 

Part (c) of Theorem 1 gives a risk bound of the SureBlock estimator in 
the case of 9 being in a neighborhood of the origin. This bound is technically 
useful later for analysis in the wavelet function estimation setting. Parts (a) 
and (c) of Theorem 1 can be regarded as a generalization of Theorem 4 
of Donoho and Johnstone (1995) from a fixed block size of one to variable 
block size. This generalization is important because it enables the resulting 
SureBlock estimator to be not only adaptively rate optimal over a wide 
collection of Besov bodies across both the dense {p > 2) and sparse {p < 2) 
cases, but also sharp adaptive over spaces where linear estimators can be 
asymptotically minimax. This property is not shared by fixed block size 
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procedures such as VisuShrink, SureShrink and Block JS, or the empirical 
Bayes estimator introduced in Johnstone and Silverman (2005). 

In addition to the oracle inequalities given in Theorem 1, it is also inter- 
esting to consider the properties of the SureBlock estimator over tp balls, 

(13) ep(r) = {^€M'^:||0||p<r}. 

Theorem 2. Let {xi,i = I, . . . ,d} be given as in (2) with (i > 4 and let 
9* be the SureBlock estimator defined in (8). 

(a) (Adaptivity for dense signals.) For some constant c > and for all 
p>2 

sup R{9*,e)< ^ +cd-^'\\ogdfl\ 

(b) (Adaptivity for moderate sparse signals.) For some constant c> and 
for alll<p<2, sup0gep{r) Rid\0) < cd-^TP(log{dT-P))^-P/^ + cd-^/'^{logdf/^ . 

(c) (Adaptive for a very sparse sig nals.) For0<p<2 andr < -^d^^^ ^ogf'^ d, 

there is a constant c > such that sup^gQ (^^^R{6*,9) < d'^r"^ + cd~^ x 
(log tf)- 1/2. 

3. The SureBlock procedure for wavelet regression. Let ^} be a pair 
of compactly supported father and mother wavelets with J (j)= 1. Dilation 
and translation of (p and ip generate an orthonormal wavelet basis with an 
associated orthogonal DWT which transforms sampled data into the wavelet 
coefficient domain. A wavelet V is called r-regular if -0 has r vanishing 
moments and r continuous derivatives. See Daubechies (1992) and Strang 
(1992) for details on the DWT and compactly supported wavelets. 

For simplicity in exposition, we work with periodized wavelet bases on 
[0,1]. Let 

oo oo 

^Iki^) = ^j,k{x - I), tPlki^) = ^j,k{x - I) for X G [0, 1], 

/=— oo /=— oo 

where 4>j^k{x) = 2^^'^(j){2^x — k) and ipj^ki-c) = 2^^'^'ip{2^x — k). The collection 
{(j/j^^ f^,k = 1, . . . ,2^°;'ijjj > Jq > 0, k = 1, . . . ,2^} is then an orthonormal 

basis of L'^[0, 1], provided jo is large enough to ensure that the support of 
the wavelets at level jo is not the whole of [0, 1] . The superscript "p" will be 
suppressed from the notation for convenience. A square-integrable function 
/ on [0, 1] can be expanded into a wavelet series 



(14) 



2n oo 2J 

k=l j=jo k=l 
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where ^j^^k = (/, 4'jo,k) ^re the coefRcients of the father wavelets at the 
coarsest level which represent the gross structure of the function /, and 
Gj,k = {fi ''Pj,k) a.re the wavelet coefficients which represent finer and finer 
structures as the resolution level j increases. 

Suppose we observe y = (yi, . . . , ?/„)' as in (1) and suppose the sample size 
n = 2"^ for some integer J > 0. We use the standard device of the DWT to 
turn the function estimation problem into a problem of estimating wavelet 
coefficients. Let F = • n'^/^y ^e the DWTs of n'^/^y. Then Fcan be 
written as 

(15) Y = (fjo,i, . . . , fj.^2io , Vjo,U yjo,2n , ■ ■ ■ ,Vj-i,i, ■ ■ ■ , ?7j-i,2J-i)'> 

where jo is some fixed primary resolution level. Here £,jo^k are the gross 
structure terms, and ijj^k are the empirical wavelet coefficients at level j 
which represent fine structure at scale 2^ . Since the DWT is an orthogonal 
transform, the yj^k are independent normal variables with standard deviation 
Gn = n~^/'^a. The mean of yj,k, denoted by Oj^k, is the DWT of the sampled 
function {n'^/"^ f{^)}. Note that Oj^k equals, approximately, the true wavelet 
coefficient 6j k of /. The approximation error is given in Lemma 4 in Section 
6. Through the DWT, the nonparametric regression problem is then turned 
into a problem of estimating a high-dimensional normal mean vector. 

3.1. SureBlock for wavelet regression. We now return to the nonpara- 
metric regression model (1). Denote by Y_j = {yj^k - k = 1, . . . ,2^} and 0j = 
{9j^k ■ k = 1, . . . ,2^} the empirical and true wavelet coefficients of the regres- 
sion function / at resolution level j. We apply the SureBlock procedure 
developed in Section 2 to the empirical wavelet coefficients Y_j level by level 
and then use the inverse DWT to obtain the estimate of the regression func- 
tion. More specifically the SureBlock procedure for wavelet regression has 
the following steps. 

1. Transform the data into the wavelet domain via DWT: Y = W- n-^l'^Y. 

2. At each resolution level j, estimate the wavelet coefficients using Sure- 
Block, that is, 

(16) f,=^n-r(a-i£,), 

where Un = n~^^'^a and 0* is the SureBlock estimator given in (8). The 
estimate of the whole function / is given by 

2^0 J-l 2^ 

(17) r{t) = ^ 4,fc<^J0,A:(i) + E E ^3,ki^3Mit)- 

k=l j=jo k=l 
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3. The function at the sample points / = {/(^) :i = 1, ■ ■ ■ ,n} is estimated 
by the inverse transform of the denoized wavelet coefficients: / = • 

This procedure is easy to implement with good numerical performance. 
Theoretical results given in Section 5 show that , is optimal in the 
sense that the resulting estimator adaptively attains the exact minimax 
block thresholding risk asymptotically. 

4. Implementation and numerical results. We now turn to the numerical 
performance of SureBlock. Proposition 1 below shows that for a given block 
size L it suffices to search over the finite set A for the threshold A which 
minimizes SURE{x,\,L). This makes the implementation of the SureBlock 
procedure easy. The result is also useful for the derivation of the theoretical 
results of SureBlock. 

Proposition 1. Let Xi,i = 1, . . . ,d, and SURE{x,X,L) be given as in 
(2) and (6), respectively. Let the block size L be given. Then the minimizer A 

of SURE{x, A, L) is an element of the set A where A = {xf; 1 < i < d} U {0} 

if L = l, and A = {Sf; Sf > L - 2,1 < i < m} U {L - 2} if L>2. 

The noise level a is assumed to be known in Section 2. In practice a 
needs to be estimated. As in Donoho and Johnstone (1994a) we estimate 
a based on the empirical coefficients at the highest resolution level by cj = 
Q^median(|n^/2yj_i,fc| : 1 < /c < 2-^-^). 

We now compare the numerical performance of SureBlock with that of 
VisuShrink [Donoho and Johnstone (1994a)], SureShrink [Donoho and John- 
stone (1995)] and BlockJS [Cai (1991)]. VisuShrink thresholds empirical 
wavelet coefficients individually with a fixed threshold level. SureShrink is 
a soft thresholding procedure which selects the threshold at each resolution 
level by minimizing Stein's unbiased risk estimate. BlockJS is a block thresh- 
olding procedure with a fixed block size logn and a fixed threshold level. 
Each of these wavelet estimators has been shown to perform well numerically 
as well as theoretically. For further details see the original papers. 

Six test functions, representing different levels of spatial variability, and 
various sample sizes, wavelets and signal to noise ratios are used for a sys- 
tematic comparison of the four wavelet procedures. The test functions are 
plotted in the Appendix. Sample sizes ranging from n = 256 to n = 16384 
and signal-to-noise ratios (SNR) from 3 to 7 were considered. The SNR is 
the ratio of the standard deviation of the function values to the standard 
deviation of the noise. Different combinations of wavelets and signal-to-noise 
ratios yield basically the same results. For reasons of space, we only report 
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here the results for one particular case, using Daubechies' wavelet Symmlet 8 
and SNR = 7. See Cai and Zhou (2005) for additional simulation results. We 
use the package WaveLab for simulations and the procedures MultiVisu (for 
VisuShrink) and MultiHybrid (for SureShrink) in WaveLab 802 are used 
(see http:/ /www-stat. stanford.edu/~wavelab/). 

Figure 1 reports the average squared errors (ASE) over 50 replications for 
the four thresholding estimators. SureBlock consistently outperforms both 
VisuShrink and SureShrink in all 42 simulation cases in terms of the ASE. 
SureBlock procedure is better than BlockJS about 88% of times (37 out 
of 42 cases). SureBlock fails to dominate BlockJS only for the test func- 
tion "Doppler." For n = 16384 the risk ratio of SureBlock to BlockJS is 
0.013/0.011 « 1.18 and additional simulations show that the risk ratio goes 
to 1 as sample size increases. The main reason for BlockJS outperforming 
SureBlock in the case of "Doppler" is that at each resolution level the few 
significant wavelet coefficients all cluster together and this special structure 
greatly increases the accuracy of BlockJS. On the other hand, SureBlock 
is invariant to permutations of wavelet coefficients at any resolution level. 
Although SureBlock does not dominate BlockJS for "Doppler," the improve- 
ment of SureBlock over BlockJS is significant for other test functions. The 
simulation results show that, by empirically choosing the block size and 
threshold and allowing them to vary from resolution level to resolution level, 
the SureBlock estimator has significant numerical advantages over thresh- 
olding estimators with fixed block size L = l (VisuShrink or SureShrink) or 
L = log 71 (BlockJS). These numerical findings is consistent with the theo- 
retical results given in Section 5. 

Figure 2 shows an example of SureBlock applied to a noisy Bumps signal. 
The left panel is the noisy signal; the middle panel displays the empirical 
wavelet coefficients arranged according resolution levels; and the right panel 
is the SureBlock reconstruction (solid line) and the true signal (dotted line) . 
In this example the block sizes chosen by SureBlock are 2, 3, 1, 5, 3, 5 and 
1 from the resolution level j = 3 to level j = 9. 

In addition to the comparison with other wavelet estimators, it is also 
instructive to compare the performance of SureBlock with the oracle risk 
n Sj,fc ^1 '^^) where Oj^k are the true wavelet coefficients. Furthermore, to 
examine the advantage of empirically selecting block sizes, we compare the 
ASE of SureBlock with that of an estimator we call SureGarrote which em- 
pirically chooses the threshold at each level but fixes the block size L = 1. 
Figure 3 summarizes the numerical results for Doppler and Bumps with 
n = 1024, SNR ranging from 1 to 15 and 100 replications. SureBlock con- 
sistently outperforms SureGarrote in all cases. The ASE of SureGarrote is 
up to 40 percent higher than the corresponding ASE of SureBlock (see the 
right panels in Figure 3). Furthermore risk of SureBlock is within a small 
factor of the corresponding oracle risk. For Doppler the ratios of the ASE of 
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Dopplfif HeaviSlne Bumps Slocks Piece Regular Place Potynomiat 



Sure Shrink vs. Sure Block 

2+1 




Dopplaf JHeaviSlne Bim^ Slocks Pleca Regular PMce Potynomlel 



BlockJS VS. SureBiock 



2+1 




Dopplef HeavlSlna Bumps Bfoclts Plaoe Ragular Ptece POIynorrlal 



Fig. 1. The vertical bars represent the ratios of the ASEs of estimators to the correspond- 
ing ASE of SureBiock. The higher the bar the better the relative performance of SureBiock. 
The bars are plotted on a log scale and are truncated at the value 2 of the original ratio. 
For each signal the bars are ordered from left to right by the sample sizes (n = 256 to 
16382j. 
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Fig. 2. SureBlock procedure applied to a noisy Bumps signal. 



SureBlock and the oracle risk are between 2 to 2.7 and for Bumps the ratios 
are between 1.5 to 2.2 (see the left panels in Figure 3). In these simulations 
the block sizes chosen by SureBlock vary from 1 to 16, depending on the 
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resolution levels. This example shows that the SureBlock procedure works 
well relative to the ideal oracle risk and empirically selecting block sizes im- 
proves the performance noticeably relative to the SureGarrote procedure. It 
would be interesting to carry out a more extensive numerical study to com- 
pare the performance of SureBlock with many other procedures including 
the empirical Bayes estimator of Johnstone and Silverman (2005). We leave 
this to future work. 

5. Theoretical properties of SureBlock. We now turn to the theoretical 
properties of SureBlock for the nonpar ametric regression problem (1) under 
the integrated mean squared error R{f,f) = E\\f — /Hg. The asymptotic 
results show that the SureBlock procedure is strongly adaptive. 

Besov spaces are a very rich class of function spaces and contain as spe- 
cial cases many traditional smoothness spaces such as Holder and Sobolev 
spaces. Roughly speaking, the Besov space Bp ^ contains functions having a 
bounded derivatives in LP norm, the third parameter q gives a finer grada- 
tion of smoothness. Full details of Besov spaces are given, for example, in 
Triebel (1983) and DeVore and Lorentz (1993). For a given r-regular mother 
wavelet ■0 with r > a and a fixed primary resolution level jo, the Besov se- 
quence norm \\-\\b" of the wavelet coefficients of a function / is then defined 
by 

/ oo \ 1/g 

(18) \\f\h, = \\ijp+(T.i'^nm\py] > 

where is the vector of the father wavelet coefficients at the primary res- 
olution level jo, Oj is the vector of the wavelet coefficients at level j, and 
s = a + ^ — - >0. Note that the Besov function norm of index {a,p,q) of 
a function / is equivalent to the sequence norm (18) of the wavelet coeffi- 
cients of the function. See Meyer (1992). The Besov body -B^ ^(M) is defined 
by Bpq(M) = {/ : \\f\\b" < M}. The minimax risk of estimating / over the 
Besov'body B^^^{M) is'" 

(19) i?*(Sp",,(M)) = inf sup E\\f-f\\l 

Donoho and Johnstone (1998) show that the minimax risk R*{Bp^{M)) 
converges to at the rate of 72-20/(1+20!) ^ ^ 

The blockwise James-Stein estimation of the wavelet coefficients and the 
corresponding function / is determined by the block size Lj and threshold 

level \j of each resolution j. Let L = {Lj)j>jg with 1 < Lj < 2-^/^, and A = 
(Aj)j>jo with Xj > 0. Let fi^x be the corresponding estimator of /. The 
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Fig. 3. Left panels: the ratios of the ASE of SureBlock and the oracle risk. Right panels: 
the ratios of the ASE of SureGarrote and the ASE of SureBlock. 
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minimax risk among all block James-Stein estimators with all possible block 
sizes L and threshold levels A is 

(20) I^{B;^^{M)) = \ni sup E\\fL,x-f\\l 
and equivalently 

oo 

RUB^^^{M))= inf sup EY,\\i^{X„Lj)-e^\\l 

We shall call B?p{Bp^q{M)) the minimax block thresholding risk. It is clear 
that E^{B^^q{M)) >R*{B^q{M)). Theorems 4 and 5 below show R^{B^g{M)) 
is within a small constant factor of the minimax risk R*{Bp g{M)). The fol- 
lowing theorem shows that SureBlock adaptively attains the exact minimax 
block thresholding risk R'^{Bpq{M)) asymptotically over a wide range of 
Besov bodies. 

Theorem 3. Suppose the mother wavelet ip is r -regular. Let f* he the 
SureBlock estimator of f defined in (17). Then 

(21) sup EfWr - fWl < RUBU^m + o{l)) 

forl<p,q<oo,0 <M <oo, andr>a>4:{^-^)+ + ^ with ^^^^ > l/p. 

Theorem 3 is proved in Section 6. The main technical tools for the proof 
are the oracle inequalities for SureBlock developed in Section 2.2. 

Theorems 4 and 5 below make it clear that the SureBlock procedure is 
indeed nearly optimally adaptive over a wide collection of Besov bodies 

g(M) including both the dense {p > 2) and sparse {p < 2) cases. The 
estimator is asymptotically sharp adaptive over Besov bodies with p = q = 2 
in the sense that it adaptively attains both the optimal rate and optimal 
constant. Over Besov bodies with p >2 and q > 2 SureBlock adaptively 
achieves within a factor of 1.25 of the minimax risk. At the same time the 
maximum risk of the estimator is simultaneously within a constant factor 
of the minimax risk over a collection of Besov bodies Bp q{M) in the sparse 
case of p < 2. 

Theorem 4. Suppose is r-regular. (i) SureBlock is adaptively sharp 
minimax over Besov bodies B2 2{M) for all M > and r>a> 0.88, that is, 



(22) 



sup EfWr - f\\l<R*{Bl2{Mm + o{l)). 

f£Bl^{M) 
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(ii) SureBlock is adaptively, asymptotically within a factor of 1.25 of the 
minimax risk over Besov bodies Bp^{M), 

(23) sup Ef\\r-f\\l<1.25R*{B'^^g{M)){l + o{l)) 

for allp>2, q>2, M >0 and '^^^ >l/p with r > a > 1/2. 

For the sparse case p < 2, the SureBlock estimator is also simultaneously 
within a small constant factor of the minimax risk. 

Theorem 5. Suppose ip is r -regular. SureBlock is asymptotically mini- 
max up to a constant factor G{p A q) over a large range of Besov bodies with 

1 <p,g < oo, < M < oo, and r>a>4{^- i)+ + ^ with ^^^j^ > l/p. 
That is, 

(24) sup EfWr - fg<G{pAq)-R*{B!^^g{M)){l + o{l)), 

where G{pAq) is a constant depending only onpAq. 

6. Proofs. Throughout this section, without loss of generality, we shall 
assume the noise level a = 1. We first prove Theorem 1 and then use it as 
the main tool to prove Theorem 3. The proofs of Theorems 4 and 5 and 
Proposition 1 are given later. 

6.1. Notation and preparatory results. Before proving the main theo- 
rems, we need to introduce some notation and collect a few technical re- 
sults. The proofs of some of these preparatory results are long. For reasons 
of space these proofs are omitted here. We refer interested readers to Cai 
and Zhou (2005) for the complete proofs. 

Consider the normal mean problem (2) with a = 1. For a given block size L 
and threshold level A, set ri,{X,L) = £^0j[^^(A, L) — ^f,lP ^'^d define r{X,L) = 
^E^=iniX,L)=ED{X,L), where D{X,L) = ^^^=1 \\lb{\ L) - 0,\\l Set 

(25) R{e)= inf r{X,L)= inf r{X,L). 

A<A-F',l<L<di/2 max{L-2,0}<A<A^",l<L<di/2 

The difference between R{0) and Ruock.oraciei^^) defined in (10) is that the 
search range for the threshold A in R{0) is restricted to be at most A^. The 
result given below shows that the effect of this restriction is negligible for 
any block size L. 

Lemma 1. For any fixed i] > 0, there exists a constant Crj > such that 
for all OeR'^, 

R{6) — Rblock.oracle{0) < Crfd^'^l'^ . 
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The following lemma is adapted from Donoho and Johnstone (1995) and 
is used in the proof of Theorem 1. 

Lemma 2. Let = d~'^ J^i^l - ^) and fid = d~^\\G Wl- If ij^/^ogd^ oo, 
then 

sup (l + /id)P(rd<7d) = o((i-i/2). 

We also need the following bounds for the loss of the SureBlock estimator. 
This bound is used in the proof of Theorem 3. 

Lemma 3. Let {xi ■.i = l,...,d} be given as in (2). Then 
(26) \\t -e\\l<Ad\ogd + 2\\z\\l. 

Finally we develop a key technical result for the proof of Theorem 1 . Set 

Ui\,L) = -SURE(x,\,L) 
d 

= 1 + ^ E {^-^^^f^ns! > A) + is! - 2L)L{Sl < A)) . 

Note that both D{X,L) and U{X,L) have expectation r(A,L). 

The goal is to show that the minimizer {X^,L^) of U{X,L) is asymptot- 
ically the ideal threshold level and block size. The key step is to show that 
Ad = \ED{X^,L^) - infA,Lr(A,L)| is negligible for max{L - 2,0} < A < A^ 
and 1 < L < d^^"^ . Note that for two functions g and h defined on the same do- 
main, \ml,j:g{x) — hifxh{x)\ < sup^\g{x) — h{x)\. Hence, \U{X'^,L'^) — 
mfx,Lr{X,L)\ = | infA.L f/(A, L) - infA.L r(A, L) j < sup;,,^ |f/(A, L) - r(A, L)| 
and consequently 



D{X^, L^) - r(A^, L^) + r(A^, L' 



(27) - [/(A^ + UiX"" , L*) - inf r(A, L) 

\.L 

<Esup\D{X,L)-r{X,L)\ + 2Esup\r{X,L)-U{X,L)\. 

X,L X,L 

The upper bounds for the two terms on the RHS of (27) is given as follows. 

Proposition 2. Let X^ = 2Llogd. Uniformly in 6 e R'^, we have 

(28) Ee sup \U{X,L)-r{X,L)\<cd-^/'^{logdf/^, 

max{L-2,0}<A<A^,l<L<di/2 

(29) Eg sup \D{X,L)-r{X,L)\<cd-^/^{logdf/^. 

max{L-2,0}<A<A^,l<L<(il/2 
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The following result, which is crucial for the proof of Theorem 5, plays 
the role similar to that of Proposition 13 in Donoho and Johnstone (1994b). 

Proposition 3. Let X ~ iV(/x, 1) and let Tp{-r]) denote the probability 
measures F{dfi) satisfying J < t]^. Letr{5^^,r]) = supjp^(^){£'i?rg(^) : 

J\^\PF{d^i) <rjP} where ry{n) = E^{6{{x) - fif and 6{{x) = {I - ^) +x . Let 
p G (0, 2) and A = y^2\ogri~P, then r{5{,r]) < 2r]PX'^~P{l + o(l)) asrj^O. 

The following lemma bounds the approximation errors between the mean 
of the empirical wavelet coefficient and the true wavelet coefficient of / € 

g(M). Set P = a — 1/p, which is positive under the assumption > 
1/p in Theorems 3, 4 and 5. 

Lemma 4. LetO = (Oj^k) be the DWT of the sampled function {n"^/"^ f{^)} 
with n = 2-^ and let Oj k = J f{x)'il)j^k{x) dx. Then \\d — d\\2 < Cn^'^l^ . 

Remark 1. Lemma 4 implies 



A, 

(30) 

= {l + o{l))Wr{B^^^{M)) 
and for p > 2 and q>2 

(31) 

oo 

= (l + o(l)) sup VV^ 

under the assumption ^"^2!/^ > ^/P which implies /? > a/ {2a + 1). The 
argument for (30) is as follows. Write 

00 00 

-9j\\l- \\Oj{Xj,Lj) - 9j\\l 

j=jo 3=30 
00 

= \\e - e\\l + 2eY^ {lj{Xj,L,) - ej,ej - Ij). 

3=30 

From Lemma 4, \\e - e\\l < Cn''^'^ = o(n-2"/(2"+i)). The Cauchy-Schwarz 
inequality implies 

00 

sup EY,{iM3^L,)-e^,e^-e^) 



DATA-DRIVEN BLOCK THRESHOLDING APPROACH 



19 



sup ||6'-6'||2 



E ^ \\9j{Xj,Lj) -djWl, 



3=30 

which is o(n"2"/(2"+i)), since \\Q - 6*112 = ©(n"^) with /? > a/(2a + 1) and 
R^{B^^g{M)) < Cn-2"/(i+2") logn from Cai (1991) in which Lj = logn and 
Xj = 4.505 logn. We know R^{B^g{M)) > R*{B^g{M)) > Cn-2"/{i+2«) from 
Donoho and Johnstone (1998). Thus (30) is estabhshed. The argument for 
(31) is similar. 

In the following proofs we will denote by C a generic constant that may 
vary from place to place. 

6.2. Proof of Theorem 1. The proof of Theorem 1 is similar to but more 
involved than those of Theorem 4 of Donoho and Johnstone (1995) and 
Theorem 2 of Johnstone (1999) because of variable block size. 

Set Td = d-^ J2ixf - 1) and 7^ = d'^/"^ logl^"^ d. Define the event Ad = 
{Td < 7d} and decompose the risk of SureBlock into two parts: 

R{9*,6) = d-^Ee{\\e* - e\\ll{Ad)} + d-'Ee{\\e* - 9\\p{A'^d)} 

= Ri,d{9)+R2,d{0). 

We first consider Ri^d{9)- On the event Ad, the signal is sparse and 9* 
is the nonnegative garrote estimator = (1 — 21og(i/3;?)_|_Xj by the def- 
inition of 9* in (8). Decomposing Ri^d{9) further into two parts with ei- 
ther lid = d~'^\\9\\l < 3-fd or Hd > 37rf yields that i?i,d(6') < RF{9)I{nd < 
37d) +^"1,^(6*) where Rf{9) is the risk of the nonnegative garrote estima- 
tor and ri^d{9) = d~^Ee{\\9* - 9\\ll{Ad)]I{nd > ^Id)- The oracle inequality 
(3.10) in Cai (1991) with L = 1 and A = = 21ogd yields that 

Rf{9) < (i-i^[(0f A A-^) + 4(^logd)-i/2^-i] 

(32) 

<d-'[\\9\\l + 4{7Tlogd)-'/\ 

Recall that ^id = ll^lll/^ 'yd = d'^^^ log2^^ d. It then follows from (32) 
that 

(33) RF{9)I{fid < 37d) < d-\3d^/^ log^^' d + 4(^ log d)-^/^] 

<cd-i/^(logd)5/2. 

Note that on the event Ad, \\9*\\2 < ||3;||2 <d + d'jd and so 

ri,rf(0) < 2d-\E\\9*\\l + \\9\\l)P{Ad)I{fid > 37d) 
< 2(1 + 2fid)P{Ad)I{itd > 37d) = oid-^/^). 
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where the last step foUows from Lemma 2. Note that for any r] > 0, Lemma 
1 yields that 

(34) Rie) - Rblock.oracle {0) < Cr,<p-^l'' 

for some constant Cr^>d and for all 6 G R'^. Equations (27)-(29) yield 

(35) R24{e) - R{e) < Arf < cd~^/\\ogdf/\ 

The proof of part (a) of the theorem is completed by putting together (33)- 
(35). 

We now turn to part (b). It follows from part (a) that R{6*,6) < Rbiock.oracie{G) + 
cd-i/^(log(i)5/2_ Note that Rbiock.oradeiO) <r{d^/^ -2,d^/^). Stein's unbi- 
ased risk estimate and Jensen's inequality yield that 

r{d'/' - 2, di/2) = V (d'/^ - {d^'^ - 2fE-^ 

b V WKjbW 

Note that E\\Xbf = + d^^^- Hence r{d^/^ - 2, d^/^) < d'^ Ebi^^^^ + 
1 - ||ej|2+rfi/2 )- The elementary inequality (EilLi «i)(EiILi a,"^) > m^, for 
ai > 0,1 < i <m yields that 

Rblock.oracle 



Il'^ll2 

and part (b) then follows. We now consider part (c). Note first that Ri^d{G) < 
Rf{0). On the other hand, R24{e) = d-^E{\\e* - 0||i/(A^)} < Cd-^{E\\e* - 
6l||4)i/2pi/2(-^c^^ To complete the proof of (12), it then suffices to show that 
under the assumption ji^i < \ld^ E\\9* — 9\\2 is bounded by a polynomial of 
d and P{A^^ decays faster than any polynomial of d~^. Note that in this 

case \\6\\l = diid<d^/'^\og^'^ d. Since ||^* Hi < lla^lll and Xi = ei + Zi, 

E\\e* -e\\\<E{2\\e*\\l + 2\\e\\lf 

<E{2\\x\\l + 2\\e\\lf<E{m\l + n4l? 
< 32\\9\\i + 8E\\z\\i < 32dlogi d+16d + Sd^. 

On the other hand, it follows from Hoeffding's inequality and Mill's inequal- 
ity that 

P(AS) = P(d-' + 2zA + ef - 1) > d- V2 log3/2 d 
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< 2exp(-Clog^ d) + ^exp{-Clogld/nd), 
which decays faster than any polynomial of d~^ . 

6.3. Proof of Theorem 2. Note that + d) is increasing in x and for 
p > 2, supeg0^(,) 11^112 = Hence 



sup R{d*,e)< sup II J 2 , + cd-^^^ {log df/'^ 



^ 11/3112 , ^ +'^^ /(logdf/ 

SUPe60p(r) 11^112 + « 

^ ^ +c(i-i/^(logd)5/2 



^l-2/p^2 + ^ 

+ cd-^/^(logd) 



2 



Now consider part (b). It follows from Proposition 3 that there is a con- 
stant Cp depending on p such that R{0) < inf;^>or(A, 1) < cd~^TP{log{dT~'P))^'^~ 
since Qpir) = {6 e R'^:\\e\\P/d < rP/d}. Part~(b) now follows directly from 
(11) in Theorem 1. 

For part (c) it is easy to check that for < p < 2, \\e\\l < \\e\\l < < 

so Hd^\ld- It then follows from Theorem 1 and (39) 

that 



R{e\e)<RF{9) + cd-^{\ogdy 



'1/2 



<i(||0||2 + 8(21ogd)-V2)+,rf-l(logrf)-l/2 

< + cd~^{\ogdy^/'^. 

6.4. Proof of Theorem 3. Again we set a = 1. Note that the empiri- 
cal wavelet coefficients yj^k can be written as yj^k = + n'^^'^Zj^k, where 

6j^k = Ejjj^k are the DWT of the sampled function {n^^^^ fi^)} and Zj^k ^ 
A'^(0, 1). To more conveniently use Theorem 1, we multiply both sides by 
and get 

(36) y'j,k = 9j,k + Zj,k, j>jo,k = l,2,...,2^, 
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where y^. , = nV2y . ^ and 9'^^, = n^/%k. Let / = EHi ^-'/'/(^)0J,fc(t), 
where re = 2"^. Note that supjg^c (^.j-) II/-/II2 = o(re~^"/(^+^")) from Lemma 4. 
To estabhsh Theorem 3, by the Cauchy-Schwarz inequahty as in Remark 1, 
it suffices to show that 

sup EfWr -f\\l< RHB^^^{M)){1 + 0(1)). 

Fix < eo < 1/(1 + 2a) and let Jq be the largest integer satisfying 2"^" < 
re'^o. Write 



3'||2 
^ II2 



n^'E,\\e'*-e'\\l=(T.T.+ E E+EEh"^^Hi^" 

\j<,/o k Jo<j<Ji k j>Ji k / 
= 51+52 + ^3, 

where Ji > Jq is to be chosen later. The terms and ^3 are identical in 
all block thresholding procedures. We thus only focus on the term 82- Since 
<eo < 1/(1 + 2a), re="re~Mogre = o(re~2"/(i+2Q))_ ^^^^^ follows from (26) 

in Lemma 3 that Si < Cn^^re""*^ logre = o{n~^'^"^^^^~^'^"^) which is negligible 
relative to the minimax risk. On the other hand, it follows from (11) in 
Theorem 1 that 

S2< E n~h^R{e:j)+ n~h^RF{9:j)I{fi'^,<3j2.) 
Jo<j<Ji Jo<j<Ji 
+ J2 ^"'c23^/4//2 
Jo<j<Ji 

= S21 + 522 + 5*23, 

where = 2-J||^^.||| = 2-%||^j||i and = T^l'^fl'^. It follows from Re- 
mark 1 that 

(37) S2x= E n-i2^i?(^;)<(l + o(l))i2?.(S-,(M)). 

^o<j<.^i 

We shall see that both ^22 and (S'23 are negligible relative to the minimax 
risk. Note that 

(38) 523= E n-^cf-^l^fl''<CnH^'-l^f^\ 

,/u<i<Ji 

The oracle inequality (3.10) in Cai (1991) with L = 1 and A = = 21og(2^) 
yields that 



2^ 

E 



VRf{&^) < J2 A A^) + 8(2 log 2^ri/22' 

(39) 



<™ll^jlli + 8(21og2^')-^/22-j. 
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Recall that /i^ = 2-^n\\9j\\l and = 2^j/2j3/2_ j^. ^^^^ follows from (39) 
that 

822= E n-^2^RF{e:j)I{fi'^,<3^2^) 
Jo<j<Ji 

(40) 

Jo<j<Ji 

Hence if Ji satisfies 2'^i = n''' with some 7 < 3(1^20) ' then (38) and (40) yield 

(41) 522 + 523 = o(n-(2")/(i+2")). 

We now turn to the term 53. It is easy to check that for 6 € Bp g{M), 
\\9j\\l < A/22^2Q'j where a' = a - (^ - i)+ > 0. Note that if Ji satisfies 
2 Ji = „7 for 

some 7 > ]^y2+2a' ' f^^^^ ^1-^ sufficiently large n and all j > Ji, 
2~^n\\6j\\2 < \^2i where 72^ = 2~^^'^fl'^ which implies plr^j < ^72^ for j > Ji 
and 7 > 2(1 — 2/3), since 

(42) ^'2, - 2--'n||^j-||^ < 2-%||^j- -^^.||^ < C2-%^-2^ = o(2--''/2j3/2)_ 
It thus follows from (12) and (39) that 

j>ji k 

< Y,{n-^2^RF{e^j)+cn~^r^''^) 
j>Ji 

(43) < E UjWl + Cn-' 

j>Ji 

j>Ji 

= o(n-(2")/a+2")), 

when 7 > (^i_^_2a)a' ^^'^ > T^a- Equations (41)-(43) hold by choosing 7 
satisfying 

"^^H (TtI^' - - 17^} < ^ < (TtW' 

which is possible for a > 4(i — ^)+ + ^ with > 1/p- This completes 

the proof. 
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6.5. Proof of Theorem 4- Define the minimax linear risk by 
RliBp^giM)) = ^ini sup ^11^-^11^. 

? linear eGB^,q{A/) 

It follows from Donoho, Liu and MacGibbon (1990) and Remark 1 that 

oo / 0^ /n ' 

RliB^^^iM)) = R*iB-^{M)){l + 0(1)) for p = g = 2, and RUB^^^iM)) < 
l.2bR*{B^^g{M)){l + o(l)) for all p > 2 and g > 2. It thus suffices to show 
that SureBlock asymptotically attains the minimax linear risk for a > 0, 
p > 2 and q>2. Since \\9 — 9\\2 = o{n~'^^) with [3 > a/{\ + 2a), we need 
only to show supegija_^(j^/) Ee\\e* - e\\l < Rl{B^g{M)){l + o(l)) similar to 
the arguments in Remark 1. 

Recall in the proof of Theorem 3 it is shown that EqIW* — 9\\2 ^ Si + 
S21 + ^22 + ^23 + ^3, where Si + ^22 + S23 + ^3 = o(n-2«/(^"+i)) and S21 = 
J2jo<j<Ji n~^2^ R{6^j) with Jq and Ji chosen as in the proof of Theorem 3. 
Since the minimax risk R*{Bpg{M)) x , this implies that S21 

is the dominating term in the maximum risk of SureBlock. It follows from 
the definition of i?(^^) given in (10) that n~^2m{e^j) < Eb^9;,lllfe(^i - 
2,Lj) — ^{jIII, where the RHS is the risk of the blockwise James-Stein es- 
timator with any fixed block size 1 < Lj < 2-'/^ and a fixed threshold level 
Lj — 2. Stein's unbiased risk estimate [see, e.g., Johnstone (2002), Chap- 

ter 9.2] yields that n'^EbEgjiiL, - 2,L,) -^'.IH < Ebij^^ + f)- 
Hence the maximum risk of SureBlock satisfies 



sup Eg\\9*-9\\l 



< 



sup V v( „^-?j'"^^': +-) -(1+0(1)) 



Lj/n 2 



|2 



< sup V E diH' Y"^ +- 




= sup ( E E^2 , r 

where the second inequality follows from a similar argument as in Remark 1 . 
Note that in the proof of Theorem 1, Ji satisfies 2-^^ = rC with 7 < (jipl^- 
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Hence if Lj satisfies 2^p < Lj < 2^/^ for some p > i, then T,j<Ji 7^ < ^2 • 

2J13/4 ^ ^(^-(2a)/(l+2a)) ^^^^^ j^g^^g 



2 



(44) 



sup ii'ell^'* — 1^112 

2Vij _ 2J /ij\2Y-2Vi 



Note that E:=r = ^ - (^)^ E:=r WPkAl- ^^en the simple 

inequality iJ2i^i ^i){J2^i o-i^) ^ "^^1 foi^ a.j > 0, 1 < i < m yields that 

Jo<j<Ji b 

(45) 



/nEtJi' \\o,r 



Efc|e,,fc|2 + 2i/n- 
Theorem 2 in Cai, Low and Zhao (2000) shows that 

(46) sup Y: V (r^2^'£ =RliB^,,iM))il + oil)). 

The proof is complete by combining (44)-(46). 

6.6. Proof of Theorem 5. Set pg{ri) =mixsup-p^(^^^ Eprg^fi) wheie J^p{r]) 
and rg{ij,) are given as in Proposition 3. Proposition 3 implies that Pg{r]) < 
r((^f,??) < 277^(2 log ?7-P)(2-p)/2(i + 0(1)) as 7?^0. For pG (0,2), Theorem 15 
of Donoho and Johnstone (1994b) shows the univariate Bayes minimax risk 

satisfies p{r]) = inf5sup^p(^) EfE^{6{x) - pf = 7?P(21og7?^P)(2-p)/2(i + o(l)) 
as r/ — > 0. Note that Pgiv)/ Piv) is bounded as r/ — > and Pg{r])/p{r]) — > 1 as 
r] ^ 00. Both Pg(r?) and p{r]) are continuous on (0, 00), so G{p) = sup^ ^ 
00, for p G (0,2). Theorems 4 and 5 in Section 4 of Donoho and Johnstone 
(1998) derived the asymptotic minimaxity over Besov bodies from the uni- 
variate Bayes minimax estimators. It then follows from an analogous argu- 
ment of Section 5.3 in Donoho and Johnstone (1998) that 



RUB^jM))<mi sup Ej2\\l^iX„l)-9^ 



<G(pAg)-i?*(i?" (M))(l + o(l)). 
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APPENDIX: TEST FUNCTIONS 

Bloclt5 Bumps 
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Fig. A.l. Test functions. 
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